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1 .  INTRODUCTION 

1,1  Preliminaries 

For  the  estimation  of  the  mean  of  a  finite  population  from 
samples  drawn  with  equal  probability  and  without  replacement  the 
sample  mean  is  almost  universally  recommended.  Indeed,  It  has  been 
shown  by  Hartley  and  Rao  [1968]  and  Independently  in  a  somewhat 
different  context  by  Royall  [1968]  that  among  the  class  of  estimators 
described  as  "scale  load  estimators"  it  is  the  only  one  which  is 
unbiased  uniformly  in  the  parameters  of  the  population.  Accordingly, 
among  the  class  of  uniformly  unbiased  estimators  the  sample  mean 
is  the  only  admissible  competitor  and  is,  therefore,  "best"  In 
any  competition  including  that  of  minimum  variance  estimators. 

Because  the  arithmetic  mean  of  a  random  sample  is  always 
unbiased  and  because  it  has  a  variance  that  is  a  function  of  only 
the  population  variance  and  the  sample  size,  it  is  a  safe  estimator. 
That  is,  even  when  there  is  no  prior  knowledge  of  the  population 
distribution  one  still  can  be  sure  of  a  predictably  "good"  estimator. 
However,  there  are  many  occasions  when  sufficient  prior  knowledge 
of  the  population  is  available  to  limit  the  class  of  populations  to 
those  Tor  which  the  population  mean  may  be  more  adequately  estimated 
by  some  statistic  other  than  the  sample  mean. 


The  citations  on  the  following  pages  will  follow  the  style 
of  Blometrika. 


1,2  Objectives 


When  the  situation  arises  that  the  population  being  sampled 
is  known,  a  priori,  to  have  characteristics  that  place  it  in  a  more 
restricted  category,  the  question  rightly  may  be  asked  as  to  whether 
the  condition  of  uniform  unbiasedness  might  not  be  dropped  and 
the  bias  and  variance  be  combined  into  a  single  measure  of  mean 
square  error.  The  purpose  of  this  research  is  to  investigate  a 
class  of  estimators,  which  shall  be  called  ^root  estimators*,  that 
will  usually  have  smaller  mean  square  error  than  the  arithmetic 
mean  for  certain  classes  of  populations. 

Root  estimators,  in  general,  are  of  the  form 
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The  particular  "root  estimators"  which  will  be  investigated  are: 

A  _  _2 

(1)  The  square-root  estimator,  (Section  II),  y  *  C^y  +  C2u 

1/2 

where  u^  *  y^  ,  which  may  be  useful  in  the  estimation 
of  the  means  of  populations  which  consist  of  positive 
quantities  only; 

•*  _  _ i 

(2)  The  cube-root  estimator,  (Section  III),  y  ■  (l-C)y  +  Cv 

1/3 

where  v^  ■  y^  ,  which  may  be  useful  in  the  estimation 
of  the  means  of  populations  which  consist  of  both  positive 
and  negative  quantities. 


Each  estimator  is  to  be  a  weighted  Bum  of  the  mean  of  the  observa- 
cions  and  the  respective  j  power  of  the  mean  of  the  j  roots 
of  the  observations.  It  is  the  values  of  the  C^,  the  weighting 
constants,  that  are  to  be  optimized. 

1,3  Procedure 

For  a  specific  known  population  of  values  the  distribution 
of  the  appropriate  roots  of  these  values  can  be  determined  mathe¬ 
matically  through  well  known  and  established  procedures.  It  is 
then  possible  to  express  y  in  terms  of  the  k  statistics  of  this  root 
distribution  and  through  it  to  determine  the  bias  and  error  mean 
square  of  the  estimator  in  terms  of  the  k  parameters  of  the  root 
distribution.  It  is  then  possible  to  investigate  the  properties 
of  the  root  estimator  for  various  sample  sizes. 

In  many  practical  sampling  applications  the  sampler  does  not 
know  the  exact  form  of  the  population  distribution  but  does  know 
certain  facts  about  it.  In  particular,  he  may  know  that  all  values 
are  positive  and  have  a  large  positive  skewness.  In  another  case, 
he  may  know  that  most  values  are  zero  with  only  an  occasional 
deviation  from  zero  which  may  be  either  positive  or  negative.  In 
order  to  make  a  "root  estimator"  useful  in  such  applications  it 
must  be  determined: 

(1)  If  there  is  a  broad  class  of  population  distributions 
for  which  a  particular  value  of  C  will  substantially 


reduce  the  mean  square  error; 

(2)  How  much  loss  of  efficiency  the  estimator  will  suffer 
if  the  population  sampled  is  not  in  this  class. 

These  two  goals  will  be  investigated  through  mathematical 
models,  deriving  the  appropriate  relationships  and  then  applying 
them  to  various  standard  probability  distributions  that  range 
across  a  broad  class  of  population  distributions.  Graphs  of  twelve 
of  these  distributions  are  shown  in  Figure  1.  The  results  will 
then  be  tested  on  real  population  data  for  verification  of 
practicality. 
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2.  THE  SQUARE-ROOT  ESTIMATOR 
2.1  Introduction 

One  of  the  most  comnon  types  of  populations  encountered  is 
made  up  of  values  that  are  all  positive.  Such  distributions  are 
apt  to  be  skewed  positively  because  they  are  bounded  at  the  lower 
end  but  not  at  the  upper  end,  or  for  other  reasons.  It  is  in  such 
a  class  of  distributions  that  the  square-root  estimator  will  be 
tested. 

The  square-root  transformation  has  a  greater  effect  the 
further  a  nunber  is  from  1.  It,  therefore,  has  the  effect  of  reducing 
the  amount  of  positive  skewness  while  reducing  the  variance. 

Negatively  skewed  distributions,  on  the  othev  hand,  will  have  the 
skewness  emphasized  by  the  square-root  transformation. 

The  squa re-root  estimator  will  be  defined,  in  general,  by 
y  ■  Cjjr  +  C^u^,  or  if  *  1,  by  y  -  (l-C)y  +  Cu^.  It  would 

be  more  appropriate  to  start  with  a  discussion  of  the  general 
case,  but  for  reasons  of  clarity  the  general  case  will  be  discussed 
in  Section  2.3, 


2.2  The  Square-Root  Estimator  of  the 

*  —  —2 
Form  y  •  (l-C)y  +  Cu 


2.2.1  Definitions 


a.  y^,  1  •  1,  •««, 


n;  a  set  of  observed  values  picked  with 
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equal  probability  and  without  replacement  from  a 
population  of  all  positive  quantities. 


_  i  11 

b.  y  ■  —  Z  y  ;  the  sample  mean. 
n  1  1 


—  1 

c.  Y  =  rr  £  y . ;  the  population  mean. 
1  1 


d.  V(y)  **  E(y-Y)  ;  the  variance  of  the  y  distribution. 


e.  **  y± 


—  i  “ 

f .  u  *  —  £  u  . 

n  1  i 


—  —2 

g.  y  =  (l-C)y  +  Cu  ;  the  square  root  estimator. 


h.  C;  the  weighting  factor  which  is  to  be  determined. 


i.  V(y)  *  E[y  -  E(y)]  ;  variance  of  y. 


j.  B(y)  **  E(y-Y);  bias  of  y  as  an  estimator  of  Y. 


k.  EMS(J)  -  E(^-Y)2  »  V(y)  +  [B(y)]2. 


£.  r  -  msh)  •  efficiency  ratio  of  EMS(y)  over  V(y). 
V(y) 


m.  k  statistics 


[Kendall  and  Stuart,  Vol.  1,  p.  280]  . 


(1)  k.  -  ±  Z  u  -  u 
l  n  ^  l 
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«>  h  -  st  *<«t  *  ">2 

(3)  k3  ■  <a-rifa-2>  J  <“l  '  ">3 

(4)  k4  “  “W  { (n^+n2)S^  -  4(n2-ta)S3S1  -  3(n2-n)S2 

+  l2nS2S2  -  6sJ} 

S  -  £  uj  ;  n(4)  -  n(n~l)(n-2)(n-3)  . 

3  i-1 

The  use  of  k  statistics  is  desirable  because  in  each  case 
£(1^)  -  k^;  that  is,  the  value  one  would  attain  if  n  ■  H.  The 
desirability  is  further  enhanced  by  the  availability  of  the  various 
relationships  which  have  been  worked  out  by  tflshart  [1952], 

2.2.2  Optimizing  the  square-root  estimator  for  finite  populations 

*  _  —9  — 

The  great  advantage  of  using  y  *  (l-C)y  +  Cu  is  that  y 
can  be  expressed  in  terms  of  k  statistics  of  the  square  roots  (u^), 

^i^t-^V^Vn“2>  +  “2'(T)k2  +  kl  •  <2-2-l> 

Bence, 


y  .  (1-01(2=1)1^  +  kj]  +  CkJ  -  (l-CX^Jkj  +  kj  . 


(2.2.2) 


Y  -  *2(1  -  i) 


[i.c.,  Y  -  k2  +  Kn]  . 


Therefore, 


B(y)  “  E(y)  -  Y  -  -Cic.U  -  -)  . 

/  n 


(2.2.3) 


Similarly  the  variances  and  covariances  of  the  k  statistics 
can  be  determined. 


Vd^)  -  E[k2  -  ECkj)]2  -  E(k2)  -  [E(k  )]2 


Eln  k4  +  n-1  ^22^  "  *2 


iK  +  -  [I  K  +  I 

n  4  n-1  22  lN  *4  N-1  *22J 


V<k2>  -  ii  -  K  *  2l^r  -  ^22 


(2.2.4) 


V(kJ)  -  EO^)  -  [E(kJ)]2  -  E[i  k2  +  ku]4  -  [E(t  1^  +  1^)] 


2  r  n  /I 


EI7  k2  +  4  +  I  k2kll3  '  ^  4  +  4+  I  VlT* 

n  n 


“  n2  ln  *4  +  K22  +  n-1  *22]  +  [n(n-l)  *22  +  n  *211  +  *1111] 
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.2.2  2  .  .  1  rl  .  2  . 

+  tt*n  *31  ~  nCn-li  *22  *  *213 ]  ~  2£N  *4  +  *22  +  N-l  22] 

u 


r  2  .  4  ,  2r2  2  . 

"  *22  *  N  *211  *1111J  "  ulN  *31  “  N(N-1^  *22  *211J 


V(kl>  "  2  “  N^*4  +  n^n  "  N**31  +  4(n  “  N**211 

n 


I  2  |  r__JL_  _  _1_1  _  _ _ _  (A  A\  l 

^  n  nn(n-l)  N(N-ljJ  n(N-l)  S»  N;' 


"22 


(2.2.5) 


Cov(kJfk2)  -  E(k£k2)  -  E(k*)  E(k2) 


Et  2  k4  +  n  k31  +  n  k22  +  k211J  ”  *2  +  *11J*2 

n 


1  2  1  .  12 

2  *4  +  n  *31  +  n  *22  *  *211  "  n  *2  “  *11*2 

n 


2  *4  +  n  *31  +  n  *22  +  *211  “  n**4  +  *22  +  N-l  *223 
n 


r  2  2  1 

lN  *31  “  nTn1!!  *22  *211J 


CoT<kl'  k2>  ’  £  -  S1  '  i^T  k22  +  2k3k1] 


(2.2.6) 
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(2.2.8) 
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®*S(y>  -  Y(J)  +  mi)]2  -  C(1  -±){e(l  -±)*2  +  (i  -i)[C(l-i)-2]<4 


+  »<C-W(1  -  ;>«£  -  i£r>  +  iCi<H>'22 

-  A(i  -  i)<31}  +  V(y)  (2.2.9) 


It  can  De  seen  In  (2.2.9)  that  for  EMS(y)  to  be  less  than  V(y) 
the  bracketed  term  must  be  negative.  It  is  also  evident  that  the 
bracketed  term  Is  a  quadratic  In  C.  This  makes  nilnlmlzing  quite 
simple  by  the  usual  process  of  equating  the  first  derivative  with 
respect  to  C  to  zero. 


2(C-l)(^)2v(k2)-2(2^)  Cov(k2,  k2)  +  2C(^±)2  k 


which,  when  equated  to  zero  and  solved  for  C,  yields 


(n-l)(  N  5  t,cA  +  2ic22  +  2ic31] 


2  ,  ,1  1.  .  ,,  1 

*2  +  (n  "  N)kA  +  2(: 


n-1 


N-l'  22 


n  >  1  (2.2.10) 


2.2.3  The  square-root  estimator  for  infinite  populations 


Analysis  of  these  relationships  is  facilitated  by  examining 

i 

the  limiting  equations  as  N  -*■  •.  Indeed,  if  N  is  moderately  large 
there  is  little  loss  of  accuracy  by  doing  so. 
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1J&  V(y)  -  ~  {ic^  +  ^k3k1  +  ^IC2IC1  +  **2)  (2.2.11) 

N >ao 

lljn  EMS(J)  -  £  (1  -  h  { [C(l  -  |)  -  2]k 
N-*>° 

+  [C(n+1)  -  41k3  -  AKjic^}  +  V(y)  (2.2.12) 


11m  C 
N-h® 


0 


<H=T> 


2<2  +  2*c3*cil 


1  >1  2 

—  K.  +  - r»  1C- 

n  4  n-1  2 


(2.2.13) 


The  disappearing  of  such  terms  as  Is  due  to  the  fact 

that  11m  k31  -  etc.  \ 

N  *°° 

Examination  Is  further  facilitated  by  the  substitution  of  the 

'  i 

i 

equivalent  central  moments  of  u 


<2  *  W2  *  E(u-U)2 

<3  *  V3  ■  E(u-U)3 

2  —  4 

k4  -  M4  -  3y2*  v4  " 


i 
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The  optimum  might  becomes,  thereby, 


3^2  +  2y2  +  2w3^i^ 


,  2,  ,  tt+1  2 

^2  +  n-1  V2 


nln4  -  v\  +  2uy3^ 

(n-l)y^  +  (n2-2n+3>U2 


n  >  1 


(2.2.14) 


_  2 

Noting  at  this  point  that  Cq  Is  positive  If  y^  +  2Uy^  >  y 
consider  the  Inequality 

0  <  E[(x-y)2  -  E(x-y)2]2  -  E(x-y)4  -  [E(x-y)2]2  -  y4  -  Uj 

2  —  2 
which  shows  y^  >  y2«  It  Is  evident,  then,  that  y^  +  2Uy2  >  y ^ 

If  y3  >  0;  that  Is,  if  the  distribution  of  the  square-root  transformed 

distribution  has  a  positive  third  moment. 

A  __ 

Now,  converting  EMS(y)  to  moments  about  U; 

EMS(y)  -  £(l-I){[C(l-i)  -  2][y4-3y2]  +  [C(n+1)  -  4]y2  -  4^}  +  V(?) 

-  ~<3 “ “) { -  2]y4  +  [C<n+i-2)  +  2]y2  .  4^}  +  M{y)  . 

(2.2.15) 

It  has  already  been  shown  that  C  >  0  when  y^  >  0.  Inspection  of 

A 

(2.2.15)  further  Indicates  that  a  large  y^  causes  a  smaller  BlS(y); 
more  evidence  that  a  population  which  is  highly  skewed  to  the  right 
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is  best  benefitted  by  the  square-root  estimator. 


2.2.4  The  bias  of  the  square-root  estimator 


The  bias  of  y  at  Cn  is 


BCy)  -  -C0(l  -  ->«, 


+  2<2  ^ 

K4  n(n+l) 

~2  +  n-1  *2 


y4 

~1+2 

1^2 _ 


UP3 

2 

»*2 


-  1 


y4  ,  .  ,n+l. 

—  - 3  + 

“2 


which  decreases  with  increasing  n,  but  at  a  very  slow  rate  when  n 

is  small.  For  example,  when  n  ■  2,  or<*er  to  double 

v4 

this  value  it  is  necessary  to  make  n  -  10.  Unless  — j  18  small, 

^2 

even  doubling  does  not  halve  the  bias. 


2.2.5  Type 8  of  distributions  for  which  BlS(y)  can  be  made 
substantially  less  than  V(y) 

The  investigations  of  the  types  of  distribution  functions  for 
which  the  error-mean-square  can  be  substantially  reduced  will  be 
facilitated  by  the  following  two  theorems. 

Theorem  1.  The  value  of  Cq  is  invariant  to  the  scale  parameter  (a 
multiplicative  constant) . 
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Proof :  Let  y^  be  distributed  as  f (y) ,  and 

1 


Let 


so  that 


u*  -  /y*  m  r/by^  -  but 

Then 

7*  .  I  Zy*  «  |  2yi  „  hy 

and  similarly 


Y*  -  bY 

u*  -  —  Eu*  ■  — E  Jb  u.  ■  /b  u 
n  i  n  i 

y*  "  (l-C)y*  +  Cu*^  ■  b(l-C)y  +  bCu^  •  by 
B(y*)  ■  E[y*  -  Y*J  *  bEly  -  Y]  ■  bB(y) 


( 


f 

j 

I- 

I: 


r 


;  | 

'}} 

1 
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Then 

V(y*)  -  V(by)  -  b2V(y) 

and 

EMS(y*)  -  b2V(y)  +  b2[B(y)]2  -  b2[EMS(y)]  . 

ft 

Therefore,  the  value  of  EMS(y*)  will  be  minimized  by  minizlng 

*  ! 
EMS(y)  which  la  accomplished  by  Cq. 

A 

Theorem  2.  The  efficiency  of  QfS(y),  (R),  is  invariant  to  the 
scale  parameter. 

Proof:  Let  y^  be  distributed  as  f(y)  and  V(y)  be  the  variance 
of  y  for  a  sample  of  size  n. 

Then 

R  .  EMS(y)  .  V(y)  ,  [B(y)]2 
V(y)  V(y)  V(y) 

Now,  letting  y*  ■  by^  as  in  Theorem  1, 

V(y*)  -  V(by)  -  b2V(y)  j 

V(J*)  -  V(by)  -  b2V(y)  j 

1 

B(y*)  -  bB(y)  .  1 
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Then 


i 


g.  -  miiil  v(?*)  i  k2vcy)  |  b2[B<y)i2  _  r 

V(y*>  V(y«)  V(y*)  b2v(y)  b2V(y) 

Theorem  3.  The  values  of  Cq  and  R  are  not  Invariant  to  the 

position  parameter.  That  is,  a  constant  added  to 
every  element  of  the  population  will  change  the 
value  of  Cq  and  R. 

_  2 

Proof:  Let  y  ^  f(y)  such  that  E(y)  -  Y  and  V(y)  <*  a  . 

_  _  _  2 

Then,  for  a  simple  random  sample  of  size  n,  E(y)  -  Y  and  V(y)  “  o  /n. 
Letting  y*  “  y±  +  b,  then  E(y*)  -  Y  +  b  and  V(y*)  “  o2.  Again, 
for  a  simple  random  sample  of  size  n  E(y*)  -  Y  +  b  and  V(y*)  »  o2/n, 
that  is,  there  is  no  change  in  the  variance  of  the  unbiased 
estimator. 

, —  <*  _  _  2 

But,  letting  u*  ■  / y*  and  y*  ■  (l-C)y*  +  Cu*  ,  we  see  that 
E(y*)  -  (1-C) [  Y  +  b]  +  CEflc*2) 

-  (1-C)  (Y+b)  +  C[£  +  k*2] 

-  [Y  +  b]  -  C[Y  +  b  -  i  tc*  -  kJ2] 

-  [Y  +  b]  -  C[k2  +  k2  +  b  -  ^  kJ  -  k*2] 
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-  IY  +  b]  -  C[(k2-  i  K*)  +  (K2  -  K*2)  +  b] 

-  [Y  +  b]  -  C(£i)  k2  -  C[iOc2-  <J)  +  (k2-  k*2)  +  b]  t 

*  _ 

Therefore,  Che  bias  of  7*  as  an  estimate  of  Y  +  b  Is 

B (y*)  -  B(y)  -  C[i(tc2-  ic*>  +  ( k[  -k*2)  +  b]  . 

Now,  since 

y*  »  (l-C)y*  +  Cu*2 

-  a-c)<rfh)  +  cu*2 

-  (l-C)y  +  Cu*2  +  (l-C)b 

-  d-C)[(£i)  k2  +  ltj]  +  Ck*2  +  (l-C)b  , 

we  have 

v(y*)  -  (i-c)2C£^)2v(k2)  +  Cl-c)2v(k2)  +  c2v(k*2) 

+  2(1-C)2(£ji)  Covd^,  kj)  +  2C(1-C)(£jji)  Cov(k2,  kj2) 
+  2C(1-C)  Cev(kJ,  k{2) 


\ 

i 

\ 

i 

1 

i 

1 

l 

$ 

i 

\ 

\ 

i 
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-  <l-C)2(£p)2V(k2)  +  7(k2)  +  C(C-2)  V(k2) 

+  2 (1-C) (— ~)  Cov(k2,  k2)  -  2C(1-C)(— )  Cov(k2,  k2) 

+  2C(1-C)(~i)  Cov(k2,  k*2)  +  2C(1— C)  Cov(kJ,  k*2) 

-  V(y)  +  C(C-2)  V(k^)  +  C2VCk*2)  +  2C(1-C)  Cov(k2,  kj2) 

-  2C(1-C)(5=i)  {CovCkj,  kj)  -  Coy(k2,  k*2)}  . 

And  then 

EMS(y*)  -  V(y*)  +  [B(y*)]2 

-  V(J)  +  C(C-2)V(k2)  +  C2V(k*2)  +  2C(1-C)  Cov(k2,  k*2) 

-  2C(l-C)(Sli)  {Cov(k2,  k£)  -  CovO^,  k*2)} 

+  (B (J)]2  +  C2[i<K2  -tc*)  +  (k2-ic*2)  +  b]2 

-  2B(y)  CI^(ic2-k*)  +  (tc2-ic*2)  +  b] 
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So, 

fMS(y*)  -  EMS(y)  +  C(C-2)V(k2)  +  C^Ck*2)  +  2C(1-C)  Cov(k2,  k*2) 

-  2C(1-C) {Covdtj,  k2)  -  Cov(k2,  k*2)} 

+  C2[^(<c2-<5)  +  (k2-k*2)  +  b]2 

-  2B(y)C[^<ic2  -tc*)  +  (k2-  k*2)  +  b]  . 


It  can  be  seen  from  this  equation  that  the  EMS(y*)  is  not  equal 

A 

to  EMS(y),  and  that  the  value  of  C  which  will  minimize  it  will  be 
a  function  of  b.  These  facts  coupled  with  the  fact  that 
V(y*)  -  v(y)  are  sufficient  to  show  that 

R» .  i  *  . 

v(r*)  V(y) 

An  example  later  in  this  section  will  further  illustrate  this 
point. 

Theorems  1  and  2  will  allow  investigations  of  such  distributions 

as 


f(y) 


al  0 


*  y° 
o+l  y 


e-y/e 


by  letting  9  ■  1,  and  then  apply  the  results  vlth  equal  effect 
to  the  sane  distribution  with  any  other  value  of  0.  On  the  other 
hand,  distributions  that  differ  by  an  additive  constant  will  not 
have  the  same  optimum  value  of  C  nor'  will  the  square-root  estimator 
have  equal  efficiencies  on  these  distributions. 

In  order  to  Investigate  the  efficiency  and  utility  of  the 
square— root  estimator,  three  specific  families  of  distributions 
were  examined.  These  families  were  chosen  because  they  represent 
a  vide  spectrun  of  population  forms. 

(1)  The  game  distributions;  f(y)  -  — -  y01  e“y,  0  ±  7  <  «  . 

Due  to  Theorems  1  and  2  any  results  applicable  to  this 
distributions  vlll  also  be  applicable  to 


f(y) 


i _ a  „-y/& 


a!  3 


a+1 


7  e 


for  any  value  of  3. 


1/2 


Making  the  transformation  u  ■  y  yields 


.  2  2a+l  -u 

f(u)  "  u  e 


from  vhich  the  first  four  central  moments  can  be  calculated  for 
various  values  of  a.  We  shall  consider  the  distributions  generated 
by  a  -  0,  1,  2,  and  3.  These  values  are  convenient  because 
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f(y)  -  e“y 

is  fairly  skewed,  while 

f(y)  ■  -g-  y3  e“y 

is  rather  symmetrical  with  only  a  slight  positive  skewness. 

Figures  2(a),  2(b),  2(c)  and  2(d)  show  the  relative  efficiencies 
(R)  of  a  nunber  of  different  distributions  for  n  -  2,  3,  5  and  10, 
respectively.  The  top  four  of  these  efficiency  curves  (number  1, 

2,  3,  4)  are  of  the  ganma  distribution  with  a  ■  3,  2,  1  and  0, 
respectively.  The  accuracy  to  which  these  graphs  can  be  read  is 
sufficient  for  practical  purposes.  Exact  values  of  R  and  CQ  for 
the  various  distributions  are  given  in  Table  1. 

It  can  be  seen  that  the  more  symmetrical  the  parent  distribution 
the  less  gain  attainable.  However,  it  should  also  be  noticed 
that  for  values  of  C  between  0  and  2.5  there  is,  in  every  case, 
some  improvement  over  V(y).  This  is  an  Important  fact  as  it  indicates 
that  the  square-root  estimator  will  give  an  improvement  in  mean 
square  error  for  any  value  of  C  between  0  and  2.5  as  long  as  the 
parent  distribution  is  at  least  as  skewed  as 

f(y>  ■  g  y  « 

To  illustrate  Theorem  3,  consider  the  distribution 


(J>  7  • 


constant  (C)  for  the  square-root  estimator 


.  Relative  efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  square-root  estimator,  n  «  5 


.  Relative  efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  square-root  estimator,  n  *  10 


TABU  1.  VALUES  OF  C  FOR  WHICH  BtS(y)  <  V(y)  FOR  THIRTEEN  DISTRIBUTIONS; 
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This  is  exactly  the 


distribution  as 


f(y)  ■  e~y 

except  that  1  has  been  added  to  each  population  value.  The  values 
shown  below  illustrate  the  differences  caused  by  this  shift. 


One  needs  not  fear  dire  consequences  because  of  such 
differences,  however.  If,  in  each  case,  a  value  of  C  ■  2  had  been 
used,  the  efficiencies  attained  would  have  been  .86  and  .78, 
respectively. 
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(2)  The  Wisher t  distributions;  £(7) 


This  group  was  chosen  In  order  to  Investigate  distributions 
with  a  bit  more  skewness  than  the  gams.  In  fact,  the  square  root 
transformation  of  these  distributions  generate  gamma  distributions. 


g(u) 


9 


for  which  the  moments  are  readily  calculated. 

The  effectiveness  of  the  square-root  estimator  for  these 
distributions  (numbers  5,  6  and  7)  for  n  ■  2,  3,  5  and  10  are  also 
shown  in  Figures  2(a),  2(b),  2(c)  and  2(d),  Again  it  can  be 
seen  that  the  more  skewed  parent  distributions  offer  greater  gains 
through  the  square-root  estimator.  Equally  important  is  the  fact 
that  any  value  of  C  from  0  to  4  will  produce  a  gain  in  efficiency 
for  these  distributions  with  Cq  ■  2  being  the  optimum  value. 

B<* 

(3)  The  Pareto  distributions;  f(y)  -  — ;  0  <0,  o  <  y  , 

y 

for  which  g  is  a  scale  parameter. 

The  Pareto  distributions  are  reputed  to  be  approximate  for 

income  distributions  and  similar  cases.  These  were  included  to 

show  what  happens  to  the  square-root  estimator  in  such  extraordinary 

1/2 

cases.  The  square-root  transformation,  u  -  y  '  ,  produces 
g(u)  -  2au**^a+^ 


9 


u  >  1 
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Moments  higher  than  2a-l  do  not  converge  so  the  distribution  j 

i 

was  truncated  to  force  convergence.  1 

-2  -3  j 

In  the  cases  of  f (y)  -  y  and  f(y)  «  2y  the  square-root  \ 

\ 

estimator  makes  even  greater  gains  of  efficiency  over  y.  It  is,  j 

i 

i 

however,  at  an  increasing  value  of  c  with  the  maximum  gain  ] 

—3  j 

for  f(y)  ■  2y  at  C  *  3.  If  such  a  value  of  C  were  being  utilized 

and  the  distribution  was,  in  reality,  a  gamma  with  ct  »  3  (number  1),  • 

i 

there  would  be  a  loss  of  information  of  approximately  4%. 

2.2.6  General  Comments 

Inspection  of  Figures  2(c)  and  2(d)  readily  illustrate  that 
for  larger  sample  sample  sizes  some  gains  are  realized,  but  two 
Important  facts  should  be  noted.  As  the  size  increases  the  efficiency 
of  the  square-root  estimator  over  y  becomes  less  and  the  value  of 
Cq  approaches  zero  for  all  distributions,  indicating  that  the 
primary  uses  of  the  square-root  estimator  are  cases  where  small 
sample  sizes  are  necessary. 

The  following  three  properties  are  quite  Important  to  the 
usefulness  of  the  square-root  estimator. 

A 

(1)  EMS(y)  is  quadratic  in  C  of  the  form 

h(C)  -  aC^  +  bC  +  d;  a  >  0  . 


(2)  EMS(y)  -  V(y)  when  C  -  0 
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(3)  EMS(y)  <  V(y)  only  whan  C  >  0  for  positive  populations 
that  yield  a  >  0. 

The  implications  of  these  properties  are: 

(1)  For  positive  populations  for  which  vu  >  0  only  values  ; 

1 

of  C  greater  than  0  and  less  than  2Cq  will  produce  a  ; 

gain  in  efficiency  and  any  value  of  C  in  this  range 

1 

will  produce  a  gain. 

(2)  If  it  is  known  that  the  population  being  sampled 

is  at  least  as  skew  as  one  of  the  standard  distributions 
a  value  of  C  can  be  established  which  will  guarantee 
that  the  square— root  estimator  will  be  more  efficient 
than  y, 

2.2.7  A  simulation  to  verify  the  efficiency  of  the  square-root 
estimator 

The  efficiency  curves  identified  by  (14)  in  Figures  2(a),  | 

i 

2(b) ,  2(c)  and  2(d)  are  for  a  set  of  data  from  Cochran  [1953] .  a  i 

I 

These  data  are,  actually,  a  sample  of  200  sizes  of  cities  in  the 
United  States  in  1920  and  are  reproduced  as  Table  2(a).  The  cities 
sizes  are  grouped  into  categories  of  an  Interval  width  of  100,000 
and  the  mid-points  of  the  categories  were  used  as  representation 
of  the  entire  category.  To  facilitate  calculation  the  sizes  have 
been  coded  by  dividing  by  50,000. 


4 
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For  this  demonstration  the  200  cities  are  taken  to  be  a 
population  of  200  which ^has  a  mean  value  of  2.66  (2.66  *  50,000) 
and  a  variance  of  12.564.  Letting  u^  be  the  square  root  of  the 
iC^  category  size  the  following  central  moments  of  the  square-root 
distribution  were  calculated: 

U  -  1.437 

V2  •  «595 

y^  -  .953 

y4  -  2.523 

It  vaa  through  the  use  of  these  values  substituted  into  equation 
(2.2.15)  and  varying  the  value  of  C  that  the  efficiency  curve 
was  generated. 

Ulth  such  a  population  as  this  it  is  not  difficult  to  determine 

* 

every  possible  sample  of  size  n  *  2  and  to  calculate  y.  For 
instance,  if  C  ■  2  the  equation  for  the  square-root  estimator  is 


35 


The  use  of  C  -  2  vas  chosen  because,  by  referring  to  Figure  2(a), 

It  can  be  seen  that  It  Is  a  very  safe  value  to  use  when  n  ■  2. 

* 

The  frequency  distribution  of  the  various  values  of  y  are  shown 
In  Table  2(b), 

Calculation  of 

EMS(y)  -  i  Ef^i  -  2. 66) 2 

A 

shows  that  EMS(y)  *  3,3.  That  makes  the  efficiency  factor 

mscj). _ _ .525 

V(y)  (12.564/2) 

which  agrees  with  the  theoretical  value  within  rounding  error. 

A 

The  expectation  of  y  calculated  from  Table  2(b)  is 

E(J)  -  2.073 


which  makes  the  bias 

B(y)  -  2,073  -  2,660  -  -.593  . 


According  to  equation  (2,3,3) 

B(J)  -  -C(l-i)*2  -  -2(y) (.595)  -  -.595 
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TABLE  2(a).  DISTRIBUTION  OF  SI2ES 
OF  200  U.S.  CITIES  IN  1920 


Population  Size 

X50.000  u 


f 


1 

3 

5 

7 

9 

11 

13 

15 

17 

19 

21 


1 

133 

1.732 

36 

2.236 

11 

2.645 

5 

3.000 

4 

3.317 

4 

3.606 

0 

3.873 

4 

4.123 

0 

4.358 

1 

4.583 

2 

T 

2 


U 

w2 

W3 

w4 


2.66 

12.564 

1.437 

.595 

.953 

2.523 


f  Original  population  parameters 


Square-root  transformed  population  parameters 


[Source:  Cochran  (1953)  p.39] 
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TABLE  2(b).  DISTRIBUTION  OF  SQUARE-ROOT  ESTIMATOR 
FOR  ALL  SAMPLES  OF  SIZE  n  -  2  FROM  SIZES 
OF  200  U.S.  CITIES  IN  1920 


ample 

V,.  > 

f 

Sample 

A 

y 

f 

Sample 

<  *  | 

f 

1,1 

1.000 

8,778 

5,7 

5.916 

55 

9,21 

13.748 

8 

1,3 

1.732 

4,778 

5,9 

11,11 

11.000 

6 

1,5 

2.236 

1,463 

1 

11,13 

11.958 

0 

1,7 

2.646 

665 

1 

vH 

11,15 

12.845 

16 

1,9 

3.000 

532 

44 

11,17 

13.675 

0 

1,11 

3.317 

532 

5,17 

9.220 

0 

11,19 

14.475 

4 

1,13 

3.606 

0 

5,19 

9.747 

11 

11,21 

15.199 

8 

1,15 

3.873 

532 

5,21 

10.247 

22 

13,13 

13.000 

0 

1,17 

4.123 

0 

7,7 

7.000 

10 

13,15 

13.964 

0 

1,19 

4.359 

133 

7,9 

7.937 

20 

13,17 

14.866 

0 

1,21 

4.583 

266 

7,11 

8.775 

20 

13,19 

15.716 

0 

3,3 

3.000 

630 

1 

7,13 

9.539 

0 

13,21 

16.523 

0 

3,5 

3.873 

396 

7,15 

10.247 

20 

15,15 

15.000 

6 

3,7 

4.583 

180 

7,17 

10.909 

0 

15,17 

15.969 

0 

3.9 

5.196 

144 

7,19 

11.533 

5 

15,19 

16.882 

4 

3,11 

5,745 

144 

7,21 

12.124 

10 

15,21 

17.748 

8 

3,13 

6.245  . 

0 

9,9 

9.000 

6 

17,17 

17.000 

0 

3,15 

6.708 

144 

9,11 

9.950 

16 

17,19 

17,972 

0 

3,17 

7.141 

0 

9,13 

10.817 

0 

17,21 

18.894 

0 

3,19 

7.550 

36 

9,15 

11.619 

16 

19,19 

19.000 

0 

3,21 

7.937 

72 

9,17 

12.369 

0 

19,21 

19.975 

2 

5,5 

5.000 

55 

9,19 

13.077 

4 

21,21 

21.000 

1 

E(y)  -  -  2.073;  B(y)  -  2.073  -  2.660  -  -.593 


MSB (y)  -  3.3 


which  again  shows  agrees ent  within  rounding  error 
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»  Y  +  (C^  +  —  -  1)k2  +  (Cx  +  C2  -  1)k11 
-  ^2 

B(y)  -  -[(1  -  Cx  -  -^)k2  +  a  -  Cx  -  C2)<n]  (2.3.2) 

v(y)  -  cj(£^)2  V(k2)  +  (Cx  +  c2)2  v(k2) 

+  2C1(C1  +  C2)(^~)  Cov(k2,  k2)  (2.3.3) 

EMS(y)  -  V(J)  +  [B(y)J2 


«  2Ci^)2  Vd^)  +  2(CX  +  C2)  V(k2) 

+  2(2CX  +  C2)(^l)  Cov(k2,  k^) 

C2 

-  2[(1  -  Cx  -  — )k2  +  (1  -  Cx  «  C2^lcll^,c2  +  Kll^ 


Equating  to  zero  and  Isolating  and  C2: 


Cll(Sir)2  V(k2)  +  V(kl>  +  2(£TL)  Cov<V  +  <k2  +  k11)2] 

+  C2[V(k2)  +  (£=i)  Cov(k2,  k2) 


+  (-—  +  *c11>(<2  +  Kn)l  ■  (k2  +  Kll^2 
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3EMS(y) 

3C2 


+  C2)  V(k2)  +  2C1(^i)  Cov(k2,  kj) 


-  2‘(1  *  C1  -  -T),!2  +  O  *  C!  -  +  «UJ 


Again,  equating  to  zero  and  isolating  and  C2 


Cl[V(kJ)  t  (iCi)  Cov(k2,  kp  +  («2  +  ku)(^+  KU)] 


+  c2[v(tf)  +  <.-£■  +  ku>2:  -  (k2  +  ku)(-^-  +  k21>  . 


Letting 


(<2  + 


(K2  +  Kll)(—  +  Kil> 


A  -  (™^)2  VO^)  +  V<k2)  +  2  (~^)  Cov(k2,  k2)  +  E1 


B  -  V(k2)  +  (^)  Cov(k2,  k2)  +  E2 


D  -  V(k2)  +  +  ku)2 


then 


and 


AC^  +  BC2  * 
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BC^  +  DC2  =  E2 


Solving  simultaneously  yields 


DE,  -  BE. 
,  _  1  2 

J1  ”  2 
AD  -  B 


(2.3.4) 


AE,  -  BE. 

C,  «  —2 - ri  .  (2.3.5) 

AD  -  B 

’  \  :  1 

•  !  ■ 

All  of  the  equations  pertinent  to  the  general  square-root 

estimator  are  complex  and  extremely  difficult  to  analyze  critically, 

\ 

However,  the  calculations  for  specific  distributions  are  quite 
easy  with  the  aid  of  a  computer,  so  tables  have  been  prepared 
showing  the  results  of  applying  the  general  square-foot  estimator 
to  the  thirteen  standard  distributions. 

Evaluation  of  the  optimum  values  of  and  C2  appear  in 

Table  3«  It  is  immediately  obvious  that  the  values  are  quite 

i 

dependent  upon  the  form  of  the  distribution  being  sampled.  For 
example,  C2  ■  0  for  all  of  the  gamma  distributions,  while  ■  0 
for  all  of  the  Wlshart  distributions.  It  is  also  interesting 
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to  note  that  *  1  for  all  form a  and  sample  sizes  of  the 

Pareto  distribution. 

Table  4  is  a  comparison  of  the  optimum  efficiency  ratios  of 
the  two  forms  of  the  square-root  estimator.  The  efficiency  ratio 
of  the  general  square-root  estimator  is,  of  course,  better  in  all 
cases  if  the  optimum  values  of  and  for  the  specific  distribu¬ 
tion  are  being  used.  If  the  specific  type  of  distribution  is 
unknown  it  would  not  be  possible  to  incorporate  "workable"  values 
of  and  that  would  be  safe  for  all  distributions. 

The  use  of  the  general  square-root  estimator  should,  therefore, 
be  restricted  to  those  cases  where  there  is  a  priori  knowledge 
of  the  form  of  the  parent  distribution. 


TABLE  3.  VALUES  OP  C,  AND  C,  FOR  OPTIMUM  EFFICIENCY  FOR  THIRTEEN  DISTRIBUTIONS 
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TABLE  4.  COMPARISON  OF  OPTIMUM  EFFICIENCY  RATIOS  OF  y  -  C.y  +  C,u  (TOP) 
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3.  THE  CUBE-ROOT  ESTIMATOR 


3 . 1  Introduction 


t 


i 


When  a  population  conalsts  of  both  positive  and  negative 

numbers,  the  square  root  estimator  is,  of  course,  impossible  to 

use.  The  cube  root  of  a  negative  number  is  defined  so  an  estimator 
*  -  -3 

of  the  form  y  -  (1  -  C)  y  +  Cv  is  suggested. 


(a)  v1 


(b)  v 


(c)  kx  «  v 


3.2  Definitions 


j  I 

i 

I 


<“>  k2  ■  - J)2 

<e)  k3  “  (n-1) (n-2)  £<vi  *  v)3 

(f)  -  ~ y  {(n3  +  n2)S4  -  4(n2  +  n)S3S1  -  3(n2  -  n)S2 

n 

+  12n  S2sJ  -  6S*} 


■i 


<g)  k5  -  ~y  {(n4  +  5n3)S5  -  5(n3  +  5n2)S4S1  -  10(n3  -  n2)S3S2 
n 


+  20 (n2  +  2n)S3S2  +  30 (n2  -  n)S2S1  -  60S2S3  +  24S3} 


(h)  k6  -  -^gy  {(n5  +  16n4  +  lln3  -  4n2)Sg 


-  6(n4  +  16n3  +  lln2  -  -  15n(n  -  l)2(n  +  4)S4S2 


-  10(n4  -  2n3  +  5n2  -  4n)S2  +  30(n3  +9n2+  2n)S4S2 


+  120 (n3  -  n)S3S2S1  +  30(n3  -  3n2  +  2n)S3 


-  120 (n2  +  3n)S3sJ 


270(n2  -  n)S2S2  +  360S2sJ 


120sJ> 


(i)  S, 


n 

2  (v,)J 
1=1  1 


3.3  Derivations  of  Bias  and  Error  Mean  Square 


The  cube  root  estimator  will  be  used  In  the  form 

A  _  _3  _ 

y  "  (1  -  C)  y  +  Cv  with  y  being  expressed  In  terms  of  k  statistics 


of  the  u^'s. 


Since  k3  -  ^p_^^-_2)  E(vA  -  v)J  it  Is  possible  to  expand 


the  last  term  and  solve  for  Ev 


3  (n-1) (n-2 


k3  +  3(n-l)k^k2  +  nk^. 


5  - : -  <sr>  <¥>k3 +  3<aHi)kik2 +  ki 


y  -  (1  -  >k3  +  3  <^>*1*2  +  kll  +  C  kl 


-  y  -  C(2^>r<2^)*3  +  Jkjkjl 


(3.3.1) 


y  -  (1  -  OK^H^k,  +  3<^)k2k1]  +  kj 


(3.3-1 D1 


E(y)  -  Y  -  C (~)  [  (~)E(k3)  +  SECk^)] 


-  Y  -  C(2~)[(2~)<3  +  3<£  k3  +  K^)] 


7  -  C(^)C(^)<3  +  3*^] 


B(y)  -  -c(~)t(“p>*3  +  ^l1 


(3.3.2) 


V(y)  -  (^i)2  (2^)2V(k3)  +  9(Ki)2V(k2k1)  +  V(k3) 


+  6(~)2  (2~)Cov(k3,  k2kx)  +  2(2^i)(£—)Cov(k3,  kj) 


n-1. ,n-2. 


n  n 


+  6(^p)Cov(k*,  k2k1) 


(3.3.3) 
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V(y)  -  (1  -  cj2  (S=l)2  (~^>2V(k3)  +  9(1  -  C)2 

+  V(kJ)  +  6(1  -  C)2  (^p)2  (S~)Cov(k3,  k^) 

+  2(1  -  C)(2=i)(2^)Cov(k3,  lcj) 

+  6(1  -  C)(£~)Cov(k*.  k^)  (3.3.4) 


EMS(y)  -  V(i)  +  [B(y)]2 

* 

2(C  -  l)(~)2V(k3)  +  18(C  - 
+  12 (C  -  1)(~}2  (2li)Cov(k3,  k2kL) 

-  2(^i)(^2.)Cov(k3,  kj)  -  6(~i)Cov(k^,  k2kx) 

+  2C(S=i)2  [(^)<3  +  . 


Equating  the  derivative  of  3!S(y)  to  zero  and  solving  for  C 
yields 


A+(s^)Cov(k3,kJ)+3Cov(kJ,k2k1) 

A+(^i)[(~i)K3+3KlK2)2 


(3.3.5) 


where 
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A  -  V  +  9(^i)V(k1k2)  +  6(2zi)(Bl2)cov(k3,  k^)  . 

a 

In  order  to  evaluate  QlS(y)  it  la  necessary  to  determine 
those  variance  and  covariance  terms  appearing  in  (3.3.3),  (3.3.4), 
and  (3.3.5).  The  derivations  will  be  for  infinite  populations  only. 


V(k3)  -  E(k2)  -  [E(k3)]2  -  E(k2)  -  k2 


Eln  k6  +  n-1  k42  +  n-1  k33  +  (n-l)(n-2)  k2225  "  *3 


I  K  +±KK  +S±B2  6n  3 

n  *6  +  n-1  *4*2  +  n-1  *3  +  (n-l)(n-2)  K2  * 


(3.3.6) 


v(kxk2) 


-  E(kjk2)  -  [Ed^kj)!2 


Er _L  k  +  A  k  +  3nfl  .  2(p+-l)  1 

n3  6  n2  51  n2<n-l)  42  nVl)  33  "  411 


4.  ^  k  +  -Sti  k  4  Hti.  k  1 

n(n-l)  *321  n(n-l)  *222  +  n-1  *2211 J 

-  ‘E<lr  k3 +  k2i»2 


1  ^  2 


3n+l 

n2 (n-1) 

42  T  2, 

n  (n-1) 

u+1 

.3  ,  n+1  2  2 

n(n-l) 

2  +  KTT  Vi 

3  n  *4*1 
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ri  ^  A 
n  *3  +  *21 1 


1.2  3n+l  n+3  2.1  2 

— V  K,  +  — =-  K-K,  +  — = -  K  .Kn  +  —t -  K,  +  —  K.K, 

3  6  2  5  1  2/-*42  2  /  1  v  3  D  4  1 

n  n  n  (n-1)  n  (n-1) 


.  2(n+3)  .  rri-1  3.222 

+  - -X.  K  K  k  +  — - -  K  +  - —  K-IC,  . 

n(n-l)  321  n(n-l)  2  n-1  2  1 


(3.3.7) 


V(kJ)  -  E(kJ)  -  [E(kJ)l2  . 


Using  techniques  similar  to  those  used  In  (22)  and  (23) ,  this 
yields ; 


3.  1,1  6  15  ,9  2  15  2 

V^V  *  n  K6  +  ~T  *5*1  +  *4*2  +  *3  +  -f  *4*1 

n  n  n  n"'  n 


+  *3*2*1  +  k2  +  ~a  K'*K1  +  k2k1  +  ^*2*1^  (3.3.8) 


cov(k3>  k.^)  «  E(k3,  k^)  -  E(k3)E(k1k2) 


1.1  .  n+5  .62 

— =•  K-  +  —  kck.  +  —7 — rr  K,K„  +  —7 — TV  K_ 
26  n  5  1  n(n-l)  4  2  n(n-l)  3 


+  n-1  *3*2*1  * 


(3.3.9) 
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Cov(k3,  k*)  -  E(k3kJ)  -  E(k3)E(kJ) 


3  *6  +  2  *5*1  +  2  *4*2  +  n  *4*1  * 
n  n  n 


(3.3.10) 


Cov(k1k2  ,  kp 


E(kjk2)  -  E(k1k2)E(k3) 


1  .  4  7 

4  *6  +  3  *5*1  +  3  *4*2 
n  n  n 


3  2  6  2 

3  *3  +  2  *4*1 
n  n 


+  if  K_K_K.  +  -i-  kI  +  -  •c.icj  +  -  kIk*  .  (3.3.11) 

zJZl  t  L  n  J  l  n  z  l 
n  n 


3.4  The  Bias  of  the  Cube-Root  Estimator 
B(y>  -  — C(^)l(f)«,  +  3Vl] 

Is  a  function  of  the  first  three  moments  of  the  cube  root  distribu¬ 
tion  and  ist  therefore,  sensitive  to  large  values  of  these 
moments.  The  third  moment  of  the  cube-root  distribution  is  apt 
to  be  small  and  cause  little  problem.  The  effect  of  3k2<3  is  not 
so  easily  dismissed.  The  effect  of  a  non-zero  mean  on  the' bias 
could  be  severe,  especially  since  the  mean  square  error  is  increased 
by  the  square  of  the  bias. 
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3.5  Types  of  Distributions  for  which  QiS(y)  Can 
be  Made  Substantially  Less  than  V(y) 

Consider  the  distribution  of  errors  that  would  be  encountered 
in  a  corporate  account  audit.  Most,  by  far,  of  the  account  entries 
would  be  correct,  i.e.,  with  zero  error.  A  small  portion  would 
have  errors  and  these  errors  would  be  both  positive  and  negative. 
One  of  the  more  onerous  duties  of  an  auditor  is  to  determine  the 
average  amount  of  error  in  such  accounts  in  order  to  detect  if 
the  total  is  substantially  in  error.  Since  he  is  sampling  for  a 
rare  attribute  (error) ,  his  sample  size  usually  must  be  quite 
large  in  order  to  be  effective.  An  estimator  which  would  contain 
the  same  amount  of  information  with  a  smaller  sample  size  would 
be  valuable. 

In  order  to  evaluate  the  effectiveness  of  the  cube-root 
estimator  in  such  a  situation,  three  types  of  error  distributions 
will  be  considered  using  the  following  definitions: 

P  *  proportion  of  population  containing  error. 

S  «  proportion  of  the  errors  which  are  negative. 

(1)  The  rectangular  distribution  of  errors, 

f  (y)  -  PS  -1  <  y  <  0 


! 

' 


{ 


I 


! 

3 

i 

i 


t 


I 

5 


5 


! 

i 

1 


j 


-  1  -  P 


y  ■  0 
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In  this  case  we  are  assuming  that  the  positive  errors  of  various 
magnitudes  are  equally  likely  and  the  negative  errors  of  various 
magnitudes  are,  likewise,  equally  likely. 


then 

g(v)  =  3PS  v3  -1  <  v  <  0 

=  1  -  P  v  -  0 

=  3P(1  -  S)  0  <  v  <  1 

Figures  3(a),  3(b),  and  3(c)  illustrate  the  gains  in  efficiency 
which  can  be  attained  for  P  =  .02,  .10,  and  .20,  and  for  S  «■  .95 
and  .45  at  each  of  these  levels  of  P.  The  efficiency  factor  (R)  is 
shown  as  a  function  of  C. 

Extremely  large  gains  are  possible  in  the  populations  which 
are  only  slightly  unbalanced.  For  small  sample  sizes  large  gains 
are  attainable  even  when  19%  of  the  population  is  in  error  to  one 
side  of  zero  while  only  1%  are  in  error  on  the  other  side.  For 
larger  sample  sizes,  however,  the  possible  gains  in  efficiency 
become  much  smaller  and  the  range  of  values  of  C  which  will  allow 
gain  becomes  much  more  critical. 


Figure  3(a).  Relative 'efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  cube  root  estimator.  Rectangular  distribution 
of  errors,  n  ■  5 


-.5  0  .5  1.0  1.5  20  2.5 


C 

Figure  3(b) .  Relative  efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  cube  root  estimator.  Rectangular  distribution 
of  errors,  n  »  20 
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(2)  The  uniformly  decreasing  distribution 


f (y)  -  2PS(1  +  y) 


(1  - 


2P(1  -  S)(l  -  y) 


-1  <  y  <  0 


y  -  o 


0  <  y  £  1 


This  distribution  is  similar  to  the  rectangular  distribution 

except  that  the  probability  of  error  decreases  with  distance  from  0. 

Letting  ^ 

3 

vi  "  yi 


g(v)  ■  6PSv2(1  +  v3) 


-1  <  v  <  0 


1  -  P 


6P(1  -  S)v2(l  -  v3) 


0  <  v  <  1 


Comparison  of  Figures  3(d),  3(e),  and  3(f)  shows  the  same 
results  as  occurs  in  the  rectangular  distribution  except  that  the 
Imbalance  does  not  have  as  great  an  effect  on  the  efficiency  ratio. 
This,  of  course,  is  because  there  is  a  smaller  effect  on  the  first 
three  central  moments. 


(3)  The  parabolic  distribution  of  errors 

.2 


f (y)  -  3PS(1  +  y)‘ 


-  1  -  P 


-1  <_  y  <  0 


y  -  0 


3P(1  -  S)(l  -  y )‘ 


0  <  y  <  1 
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C 

Figure  3(d).  Relative  efficiency  (R)  as  a  function  of  the  weighting 
conatant  (C)  for  the  cube  root  estimator.  Uniformly  decreasing 
distribution  of  errors,  n  -  5 


fa 
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5  0  .9  1.0  1.5  Z0  29 


C 

Figure  3(e) .  Relative  efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  cube  root  estimator.  Uniformly  decreasing 
distribution  of  errors,  n  ■  20 


Figure  3(f) .  Relative  efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  cube  root  estimator.  Uniformly  decreasing 
distribution  of  errors,  n  -  50 


61 


The  parabolic  distribution  is  similar  tz  the  rectangular  and 
uniformly  decreasing  distributions  except  that  it  reduced  the 
probabilities  of  larger  errors  by  the  square  of  the  distance  from 
zero. 

Letting  ^ 

3 

vi  "  yi 

g (v)  -  9PS(v  +  v4)2  -1  <  v  <  0 

=  1  -  P  v  -  0 

-  9P(1  -  S)(v  -  v4)2  0  <  v  <  1 


Figures  3(g),  3(b),  and  3(i)  illustrate  once  again  the  same 
basic  results  shown  by  the  rectangular  and  uniformly  decreasing 
distributions.  The  better  the  balance,  the  greater  the  gain. 

The  similarity  in  the  results  of  these  distributions  serves 
to  indicate  that  for  small  sample  sizes  there  is  a  value  of  C 
which  will  allow  the  cube-root  estimator  to  be  used  on  a  class  of 
distributions.  For  larger  sample  sizes  it  is  important  to  have 
some  idea  of  the  amount  of  Imbalance  before  choosing  C.  It  is 
possible  to  estimate  P  and  S  from  the  sample,  a  posteriori,  and 
to  choose  C  from  the  results.  This  will  change  the  mean  square 
error  of  the  estimator  but  it  will  allow  some  hedging  against  a 


loss  of  efficiency. 
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C 

Figure  3(g).  Relative  efficiency  (R)  as  a  function  of  the  weighting 
constant  (C)  for  the  cube  root  estimator.  Parabolic  distribution  of 
errors,  n  -  5 


i 


Ij 


Figure  3(h).  Relative  efficiency  (R)  as  a  function  of  the  weighting 

constant  (C)  for  the  cube  root  estimator.  Parabolic  distribution  of 
errors,  n  -  20 
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Figure  3(1) . 
- atant  (C) 


C 

Relative  efficiency  (R)  as  a  function  of  the  weighting 
for  the  cube  root  estimator.  Parabolic  distribution  of 


re,  n  -  50 


3.6  The  Use  of  the  Cube-Root  Estimator  to  Estimate 


Changes  in  the  Mean 

Small  sample  surveys  are  often  used  to  detect  if  the  mean  of 
a  population  has  changed  over  time  or  after  some  treatment. 
Consider,  for  example,  a  population  which  has  previously  been 
surveyed  in  total  or,  at  least,  by  a  very  large  sample.  Then  at 
a  later  date  a  small  sample  Is  taken  to  detect  if  the  mean  has 
changed.  Letting  Uq  be  the  prior  mean  of  the  population,  X^  be 
an  observation  made  in  the  later  survey,  and  X^  -  Uq  ■  y^.  Then 
y±  is  distributed  identically  with  X  with  the  exception  that 
Y  -  X  "  V 

Utilizing  the  cube-root  estimator  on  the  sample  values  of  y 

-3 

would  produce  an  estimator  y  =■  (1  -  C)y  +  cv  with  the  properties 
that  have  previously  been  described.  If  the  values  of  the  first 
six  moments  of  the  cube-root  distribution  can  be  estimated  by  using 
the  information  from  the  original  survey  it  is  possible  to 
predetermine  a  value  of  C  which  is  apt  to  give  good  results. 
Further,  if  the  distribution  is  fairly  symmetrical  the  bias  and 
mean  square  error  of  the  cube-root  estimator  may  be  quite  small 
compared  to  y. 


3.7  A  Simulation  to  Verify  the  Efficiency  of  the 
Cube-Root  Estimator 


In  order  to  verify  that  the  cube-root  estimator  does,  indeed. 


66 


TABLE  5.  DISTRIBUTION  OF  ERRORS  IN  100  AUDITED 
ACCOUNTS  OF  A  WHOLESALE  FIRM 


Error  Size  ($) 


y 

V 

f 

0 

0.00000 

90 

-.52 

-.804145 

1 

-.80 

-.928318 

1 

-1.00 

-1.000000 

2 

-2.00 

-1.259921 

1 

-3.00 

-1.442250 

1 

+0.10 

+0.464159 

1 

+0.40 

+0.736806 

1 

+1.00 

+1.000000 

1 

+10.00 

+2.154435 

1 

Y  -  .0318 

2  Original  population  parameters 

o  -  1.1698 

V  -  .0016 

y2  -  .1553 

U3  -  .0403 

Cube-root  transformed  population  parameters 

-  .3319 

u5  -  .3865 

U6  -  1.2169 
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attain  the  claimed  efficiencies,  a  set  of  account  errors  was 
obtained  from  the  auditing  records  of  a  wholesaling  firm.  The 
data  shown  in  Table  5  are  a  sample  of  100  account  errors,  but 
they  will  be  used  as  if  they  constitute  an  entire  population. 

Referring  to  Figure  3(a)  it  can  be  seen  that  C  ■  1  1*  both 
safe  and  likely  to  produce  a  small  mean  square  error  when  a*  3. 
There  is  the  added  advantage  that  the  cube-root  estimator  is  quii'e 
simple  to  calculate  when  the  weighting  constant  is  equal  to  one. 
Equations  (3.3.2)  and  (3.3.4)  reveal  that  when  n  »  3  and 

C  -  1 

B(y)  ■  .0302  , 

V(y)  -  .0101 

and 

EMS(y)  -  .0110  . 

Through  the  use  of  an  electronic  computer  every  combination  of 
the  one  hundred  values  taken  three  at  a  time  were  picked  and  y 
was  calculated  for  each  combination.  Calculating  the  moments 
of  these  actual  sample  values  yielded; 


E(y)  -  .00144 

B(y)  -  .0318  -  .0014  -  -.0304 


EMS(y)  -  .0081  . 


The  error  mean  square  calculated  from  the  actual  sample  values 
is  smaller  than  equations  (3.3.2)  and  (3.3.4)  predicted  Indicating 
that  the  finite  population  correction,  which  was  ignored  in  the 

i  \ 

development  of  the  dube-root  estimator,  has  more  influence  in 

i 

the  case  of  the  cube-root  estimator  than  it  has  in  the  case  of  the 


sample  mean. 
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4.  USES  OF  ROOT-ESTIMATORS  IN  STRATIFIED  SAMPLING 


4.1  Introduction 


Stratified  random  sampling  is  a  well  known  and  commonly  used 
'  1 
sampling  technique  wherein  a  random  sample  is  drawn  from  each 

of  the  mutually  exclusive  strata  (or  subpopulations)  of  the 

population.  This  technique  is  particularly  useful  on  populations 

which  are  naturally  subdivided  into  subpopulations »  each  of  which 

are  more  homogeneous  than  the  whole.  In  such  a  case  each  stratum 

mean  is  estimated  by  the  mean  of  individual  observations  drawn 

j 

from  that  stratum.  If  the  stratum  sizes  are  known  then  the 
stratum  totals  are  estimable,  and  through  them  the  population  total 
and  mean  afe  estimable.  Estimates  of  a  population  mean  or  total 
obtained  in  this  fashion  have  a  variance  that  is  smaller  than 
the  variance  of  estimates  obtained  from  a  simple  random  sample  of 
the  whole  population.  When  each  of  the  stratum  means  can  be 
estimated  with  smaller  mean  square  error  through  use  of  a  biased 
estimator,  it  would  seem  that  these  biased  estimators  could  be 
used  to  good  advantage  in  the  estimation  of  the  population  mean. 

Aa  is  shown  below,  this  is  not  the  case. 


4.2  Definitions 


(a)  L  -  number  of  strata  in  population 


(b)  N^  •  number  of  elements  in  the  lv  stratum 
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The  variance  of  y  is,  then 
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and 


N 

EMS(Y)  -  V(Y)  +  [B(Y)]2  -  Ewjv(y±)  +  [E(w±  -  +  Ew^]2  • 


Ni 

It  is  apparent  that  letting  w .  ■  —  will  reduce  the  error  mean 

x  N 

square  to  a  minimum  under  the  given  conditions,  so  that 

2  2 
N  .  LN/  , 

EMS(Y)  =  E-f-  V(y. )  +  [Z  -^r  B.  Y  . 

«  IN 

A  A  n 

If  we  now  use  the  relationship  V(y^)  ■  EMS (y^)  -  (B^  ,  the  QiS(Y) 
becomes 

2  2 
K  2  Ni  2 

EMS(Y)  *  E  — y  [EMS(y  )  -  B‘]  +  [E  -f  B.JZ 
N  N 

12  ^  1  L  L 

-  -y  EN‘[EMS(y,)]  +  ~  E  E  N.N  ,B  B  , 

N  1  1  NZHi^l  1111 

1  L  2  *  L 

-  *4  [aC[EMS(y.)]  +  E  E  N  N  ,B  B  ,]  . 

N*  1  1  1  i-1  if<l  1  1  1  1 


4.4  Investigation  of  EMS(Y)  Compared  to  V(y  ) 

8u 


_  i  _  i 

If  y  “  —  EN.y.,  and  each  y,  *  —  E  y  ,  which  is  unbiased 

s l  w  Xx  x  n ,  .  .  l 

i  J-l 


for  Ya,  then 


B«S(yst) 


1^2 

~2  ENiV(yi)  -  V(yst)  ,  since  all  BjL  -  0 
N  1 


EN‘[EMS(y.)] 
EMS(Y)  _  1  1 

V(^st)  EN^V^) 


E  E  N,N  ,B  B  , 

i  iVi  ~  1  1  1 
™$r(y±) 


This  makes  R 
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which  is  a  most  unfortunate  result.  The  second  term  is  the  sum 
of  all  the  L(L-l)  cross-products  of  the  N  B . ’s  divided  by  EN?V(y  ) 
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which  contains  only  L  terms.  In  order  to  clear  away  some  of  the 
confusing  factors,  assume  that  each  stratum  has  the  same  number 
of  elements  and  equal  variance.  Then  each  *  N/L  and 

2  2 
EMS(Y)  -  ~-  {L  \  lEMS(y)]  +  L(L  -  1)  ~  B2} 

«  L  L 

-  £  {EKS (y)  +  (L  -  1)B2} 

and 

v(V>  ^ T«i>  ■  r  «y)  • 


This,  then,  makes 


R 


(W)B2 
V(y)  * 


Therefore,  if  the  biased  estimator  y  is  sufficiently  efficient 


compared  to  y  and  L  is  small,  the  resultant  overall  efficiency  may  be 

,2 


comparable.  However,  as  L  increases 


(L-l)B' 

V(y) 


is  certain  to  become 


large  enough  to  cause  a  reduction  in  efficiency  below  that  of  y . 

It  is  easily  seen,  therefore,  that  estimators  which  are  biased 
are  rather  dangerous  to  use  for  estimating  the  population  mean  or 
total  when  utilizing  stratified  random  sampling.  This  is  especially 
true  when  all  B^'s  are  in  the  same  direction  as  is  the  case  with 
the  square  root  estimator. 


1 


1 

i 


73 


In  the  case  of  the  cube-root  estimator  it  is  possible  for  the 
bias  terns  to  be  positive  for  some  strata  and  negative  for  others. 
However,  the  purpose  of  such  a  survey,  usually,  is  to  detect  a 
consistent  bias  in  error,  the  very  condition  which  will  cause  the 
cube-root  error  to  be  Inefficient. 

Root-estimators,  therefore,  are  not  recommended  for  use  in 
estimating  population  means  or  totals  in  stratified  sampling.  They 
are  recommended  for  use  In  the  estimation  of  strata  means  and 
totals  when  the  n^  are  small  and  the  distributions  within  strata 
are  apt  to  be  positively  skewed. 
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! 

5.  CONCLUSION 
5 . 1  Summary 

A  —  —2 

The  square-root  estimator  of  the  form  y  ■  (1  -  C)y  +  Cu  , 

-  /y~  ,  has  been  developed  for  populations  consisting  of  all 
positive  numbers.  It  was  found  that  for  small  sample  sizes  from 
populations  with  a  large  positive  skewness  there  is  an  optimum 

a 

value  of  C  (Cq)  which  will  make  the  mean  square  error  of  y 
smaller  than  that  of  the  sample  mean.  Indeed,  it  was  found  that 
any  value  of  C  between  zero  and  2Cq  will  have  this  effect  to  some 
extent  and  that,  for  a  particular  sample  size,  values  of  C  can 
be  determined  which  will  produce  smaller  mean  square  errors  for 
a  wide  class  of  positively  skewed  distributions. 

The  general  square-root  estimator  of  the  form  y  *  C^y  +  C£U 
was  also  investigated.  This  form  was  found  to  produce  Improvement 
in  the  mean  square  error  at  the  optimum  values  of  C^  and  C2. 

The  wide  variability  in  the  values  of  C^  and  C2  between  types  of 
distributions  made  the  use  of  the  general  square-root  estimator 
a  bit  unsafe  if  the  form  of  the  distribution  is  not  known  a 
priori. 

*  —  —3 

The  cube-root  estimator  of  the  form  y  ■  (1  -  C)y  +  Cv  , 

1/3 

v^  ■  y^  ,  was  developed  for  use  on  populations  consisting  of 
both  positive  and  negative  values.  It  was  found  that  the  mean 
square  error  of  this  estimator  is  quite  sensitive  to  asymmetry 
in  the  population.  However,  for  populations  of  errors  which  are 
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predominantly  zero  the  cube-root  estimator  performed  quite  well 
in  comparison  to  y.  It  also  showed  promise  for  the  estimation 
of  small  changes  in  the  means  of  populations  for  which  a  previous 
large  sample  survey  has  established  good  estimators  of  the  higher 
moments . 

An  investigation  of  the  use  of  biased  estimators  in 
stratified  sampling  indicated  that  root-estimators  can  be  used 
for  the  estimation  of  within  stratum  means  and  totals  with  good 
results.  They  should  not  be  used  for  estimating  the  population 
mean  or  total  in  stratified  sampling,  however,  because  the  bias 
accumulates  in  the  total  and  overcomes  the  gains  made  within  the 
individual  strata. 


5.2  Future  Research 

Although  the  square-root  and  the  cube-root  estimators  show 
promise  for  practical  application  in  small  sample  estimation, 
there  are  several  aspects  of  the  problem  which  need  further 
investigation.  Of  primary  Importance  is  a  better  description 
of  the  classes  of  distributions  for  which  the  root  estimators 
are  advantageous.  The  sampling  practioner  could  make  use  of  a 
more  complete  set  of  model  distributions  than  those  exhibited 
in  this  paper.  It  also  appears  likely  that  the  square-root 
estimator  could  be  improved  by  the  addition  of  a  constant  to 
populations  which  have  values  between  zero  and  one. 


This  paper  is,  admittedly,  but  a  beginning  in  the  investigation 
of  biased  estimators  which  exhibit  a  reduced  mean  square  error. 
However,  we  are  hopeful  that  the  properties  that  have  been 
determined  for  these  estimators  will  encourage  further  investigation. 
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